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Abstract 

We provide an elementary proof for a theorem due to Petz and 
Reffy which states that for a random n x n unitary matrix with dis- 
tribution given by the Haar measure on the unitary group U(n), the 
upper left (or any other) k x k submatrix converges in distribution, 
after multiplying by a normalization factor y/n and as n — > oo, to a 
matrix of independent complex Gaussian random variables with mean 
and variance 1. 

MSC(2000): 15A52; 60B10. Key words: random matrices, Haar 
measure on the unitary group, Gaussian matrices. 

1 Introduction 

The aim of this paper is to give an alternative, elementary proof of a 
theorem first established by Petz and Reffy in [3], concerning the joint dis- 
tribution of the upper left k x k entries of a random unitary n x n matrix 
in the limit n — > oo and formulated as Theorem 1 below. This theorem is of 
particular interest in quantum statistical mechanics, where one often studies 
the behavior of a small system (corresponding to dimension k) coupled to a 
heat bath — a much larger system corresponding to dimension n. Specifically, 
Theorem 1 can be used for studying the distribution of the conditional wave 
function of a system coupled to a heat bath in the relevant limit (in which 
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the size of the heat bath tends to infinity). As we show in [I], this distri- 
bution typically converges, as a consequence of Theorem 1, to the so-called 
"GAP" measure [3], which can thus be regarded as the thermal equilibrium 
distribution of the conditional wave function. We explain this application 
further in Section [2j 

We fix some notation and terminology. Let P denote probability and E 
expectation, U{n) the group of unitary n x n matrices, and Haar([/(n)) the 
(normalized) Haar measure on this group, representing the "uniform" prob- 
ability distribution over U(n). We write (ay) for the matrix with entries 
ciij. The relevant notion of convergence of probability distributions is weak 
convergence, also known as "convergence in distribution" of the random vari- 
ables [H Sec. 25]. By a complex Gaussian random variable G with mean 
and variance a 2 we mean G = X + iY , where X and Y are independent real 
Gaussian random variables with means E X = and E Y = and variances 
EX 2 = a 2 /2 and EY 2 = a 2 /2. 

Theorem 1. If (Uij) is Haar(U(n)) distributed, then the upper left (or, in 
fact, any) kxk submatrix, multiplied by a normalization factor ^fri, converges 
in distribution, as n — > oo, to a random kxk matrix (Gjj) whose entries 
G{j are independent complex Gaussian random variables with mean and 
variance E | C^- 1 2 = 1. 

To understand the factor ^/n, note that a column of a unitary n x n 
matrix is a unit vector, and thus a single entry should be of order 1/y/n. 
A random kxk matrix such as (Gy), consisting of independent complex 
Gaussian variables with mean and variance 1, is also called u yk times a 
standard non-selfadjoint Gaussian matrix." 

Theorem 1 is a generalization of the familiar fact that the first k entries 
of a random unit vector in PJ 1 (with uniform probability distribution over 
the unit sphere), multiplied by a normalization factor y/n, converge in dis- 
tribution to a vector whose k entries are independent real Gaussian random 
variables with mean and variance 10 This fact (with R n replaced by C n ) 
is contained in Theorem 1 by specializing to just the first columns of the 
matrices (U^) and (G^-). 

1 As a physical interpretation of this fact, consider N classical particles without inter- 
action in a box A C R 3 ; a given energy corresponds to a surface in phase space A N x M 3Ar 
given by A N x S, where S is the sphere of appropriate radius oc \/N in momentum space 
R 3 ^; assuming a random phase point with micro-canonical distribution (i.e., uniform on 
A N x S), the marginal distribution of the momentum of the first particle is, in the limit 
N — > oo, Gaussian. This fact is part of the justification of Maxwell's law of the Gaussian 
distribution of momenta. 
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The proof of Petz and Reffy is based on the convergence of the joint 
distribution of the eigenvalues of a k x k submatrix of an unitary matrix 
to the corresponding distribution for a k x k Gaussian matrix. Our proof, 
in contrast, is based on the geometric properties of Gaussian random ma- 
trices. While it involves some more cumbersome estimates, it employs only 
elementary methods. 

2 Application to Typicality of GAP Measures 

We briefly describe the application of Theorem 1 in quantum statistical 
mechanics. 

Consider a quantum system entangled to its environment, so that the 
composite has a wave function ip e Ti, sys <8> 7Y cnv , with TC sys and 7i em! the 
Hilbert spaces of the system and the environment. Suppose H sys has dimen- 
sion k, while H env has very large dimension n. According to the Schmidt 
decomposition, every ip e Ti, sys <E> Ti. e nv can De written as 

k 

il> = ^2ciXi® <Pi (1) 
i=i 

with coefficients q G C, an orthonormal basis {xi, • • • , Xfc} of H sys and an 
orthonormal system {0i, . . . , 0^} in T~tenv Relative to any fixed orthonormal 
basis {bi, . . . ,b n } of Ti, env , the coefficients Uij = (bj\(f>i) of the <pi form the 
first k rows of an n x n unitary matrix, and the uniform distribution over all 
■0's with a given reduced density matrix 

Psys = ^2\ci\ 2 \Xi)(Xi\ (2) 

i 

gives rise to (the appropriate marginal of) the Haar measure on (Uij). 

For reasons we explain below, it is of interest to consider, for a fixed but 
typical ip, a random column of (U^), or, equivalently, the random vector 
(arising from a random choice of j) 

^sys = ^2°i U ij Xi = (bj\lp)env £ ^sys , (3) 

i 

where the scalar product is a partial scalar product. By Theorem 1, in the 
limit n — > oo, each column of (Uij) has a Gaussian distribution, and any 
two columns are independent; as a consequence, by the law of large num- 
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bers, for typical if) the empirical distribution of i/j sys approximates a Gaussian 
distribution on H sys with covariance p sys |§ 

This fact is significant for the proof that the thermal equilibrium distri- 
bution of the conditional wave function is the GAP measure, a particular 
probability distribution on the unit sphere of Hilbert space. Let us explain. 

The notion of conditional wave function [2] is a precise mathematical 
version of the concept of collapsed wave function. Conditional on the state 
bj of the environment, the conditional wave function ip Bys of the system is 
given by the expression ([3]) (times a normalizing factor) . Now replace j by a 
random variable J with the quantum theoretical probability distribution 

F(J = j)= IK^envf. (4) 

The resulting random vector ?/> sys is called the conditional wave function. For 
example, a system after a quantum measurement is still entangled with the 
apparatus, but its collapsed wave function is a conditional wave function. 

Now consider a system kept in thermal equilibrium at a temperature 1/(3 
by a coupling to a large heat bath. Even if we assume that if) G H sys ® H cnv 
(with the environment being the heat bath) is non-random, the conditional 
wave function i( SJS is random, and for typical if) within the microcanonical 
ensemble (i.e., for most if) relative to the uniform distribution over the sub- 
space corresponding to a narrow energy interval), the distribution of ip sys is 
a universal distribution that depends only on (3 (but neither on the details 
of the heat bath nor on the basis {bj}). As conjectured in j3] and proven us- 
ing Theorem 1 in 0], this distribution, the thermal equilibrium distribution 
of the conditional wave function, is the Gaussian-adjusted-projected (GAP) 
measure associated with the canonical density matrix of temperature 1/(3, 

PP = ^ H , Z = Tre^. (5) 

For any density matrix p, the measure GAP(p) is defined as follows. Let 
G(p) be the Gaussian measure on Hilbert space with covariance p; multiply 
G(p) by the density function || • || 2 (adjustment factor) to obtain the measure 
GA(p); project GA(p) to the unit sphere in Hilbert space to obtain GAP(p). 
We now turn to the proof of Theorem 1. 

2 This fact is similar to Maxwell's law in the classical setting of Footnote 1: For a typical 
phase point on x S, the empirical distribution of the momenta (over all N particles) 
approximates a Gaussian distribution on I 3 as N — > oo. This follows using the law of 
large numbers from the fact, described in Footnote 1, that the momentum of each particle 
is Gaussian-distributed, and that the momenta of different particles are independent. 
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3 First Part of the Proof: Construction of Uij 

We write Mj for the j-th column of any n x n matrix (M^) and 

n n 

(MjlMt) = ^M*.M rf , \\M j \\ a = Y / \M lj \ 2 . (6) 

i=l 1=1 

For z,j = 1, . . . ,n let Gij be i.i.d. complex Gaussian random variables 
with mean and variance 1. To the n columns of the matrix (Gij) apply the 
Gram-Schmidt orthonormalization procedure, and call the resulting matrix 
(Uij). That is, 

n _ A 

C . = Jf!Z =!L (7) 



with 

j'-i 

The procedure fails if the columns of (Gij) are linearly dependent, but this 
event has probability 0. Then, as also remarked in [5], (Uj) is Haar(C/(n)) 
distributed because its first column is uniformly distributed over the unit 
sphere in C n , the distribution of the second column conditional on the first 
column is uniform over the unit sphere in the orthogonal complement of the 
first column, the distribution of the j + 1-st column conditional on the 
first j columns is uniform over the unit sphere in the orthogonal complement 
of the first j columns — and this is exactly the Haar measure. 

Our method of proof is to show that | y/nUij — G^ \ is in fact small if n is 
large. More precisely, we show that for every e > 0, 

F^J2\V^Uij-Gij\<e^ -+1 (9) 

as n — > oo. This is called convergence in probability, and to obtain the 
claim of the theorem we use the known fact [H Theorem 25.2, p. 284] that 
convergence in probability implies weak convergence (of the joint distribution 
of \fnUij for i,j = 1, . . . , k), provided that all random variables are defined 
on the same probability space. Here, we can assume that for all i,j e N, the 
Gij are defined on the same probability space. 

4 Second Part of the Proof: 
Probable Geometry 

The proof of ([9]) is based on the following observations: 
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• Any two different columns of {G%j) tend to be nearly orthogonal. 

• Every column of (Gy) tends to have norm close to \fn. 

• The size of every single entry, \Gij\, stays bounded as n grows. 

These statements are to be understood in the sense that they are fulfilled 
with high probability for sufficiently large n. We now make them precise. 
Fix a (small) 5 > 0. Choose R > so large that 

F(\Gij\ <R) >l-5. (10) 

Define the following events corresponding to the three bullets above: 



An ._ 



(Gi\G e )\<J- 



\G>\\ 2 



n 




(11) 



:i2i 



„. {\Gij\ < R} (13) 

for £ < k. (C?- actually does not depend on n, but never mind.) Each 
of these events has at least probability 1 — 5: A™ £ and 5™ by Chebyshev's 
inequality and C 4 " by ( ITOl . Thus, the event 



k k k 

j=l i,j=l 



(14) 



j.e=i 



has at least probability 1 — 2k 2 5, as 2k 2 is the number of intersecting sets. 
We now show that for sufficiently large n, D n C E n , where 



E n 




Gij\ < s 



(15) 



is the event in the brackets of Q- Since 5 was arbitrary, this fact implies (JH]). 
The remainder of the proof makes no reference to probabilities, but concerns 
only the inclusion D n C E n , which can be regarded as an inclusion between 
subsets of C n . Also Aj e , 13™, and C™- will from now on be regarded as subsets 
of C n . (Now the upper index n in the notation becomes useful.) We 
thus regard Gij as fixed numbers, and assume that the matrix (Gy) lies in 
the set D 

that the n x n matrix (Gi m ) lies in the set A™ e 



When we refer to "the condition A™ e " we mean the condition 
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We proceed to show, by induction over j G {1, . . . , k}, that for sufficiently 
large n we have that for all (Gy) G D n , and for all i — 1, . . . , k, 

iVnE^-G^I < ^ (16) 

and there are constants Ci, . . . , Cf. > such that for sufficiently large n 

Wy/nUj-GjW <Cj. (17) 

From ([TBI we see that (G^-) G -E n , which is what we need to show. This 
induction is the contents of the next, and last, section. 



5 Third Part of the Proof: Estimates 



For j — 1, note that Ui = Gi/\\Gi\\. By conditions B™ and C™ v 



\y/nUn - Gji | 



n 



Gi 



\Gn\ < j— R < -—^ 



for sufficiently large n. By condition B™, 



\\V^U 1 -G 1 



\Gi\ 



\G x \\<-$=2yfa=^=='.C 1 . (19) 
yon yd 



We now collect four estimates. For I < j and sufficiently large n we find 



{G^UMK (G^Ue-Ge) + {Gj\G t ) 



< 



(20) 



< \\Gj\\ \\VnU e -G e \\ + < 2y / nC e + y^/5 =: C^y/n (21) 

where we have used the Cauchy-Schwarz inequality, A™ £ , B™, and the induc- 
tion hypothesis (fTTj) . As the next estimate, for i < k, 



(Gj\y/nUt)\ \^h~U, 



1 



< 



< 



y2c' t -=(\v^u u -G 



\Gu\ < 



C» 



(22) 

(23) 
(24) 



using f[2"Tj) . the induction hypothesis ffTB"]) . and C™ e . As the third estimate, for 
3 < k 

\\Ajf = £|Ay| a < W < (25) 



1=1 



i=l £=1 



<^(c' t )M\Ue\\ 2 = ^£(c' e r =-.c; 



(26) 



using (I2"T|) and the fact that is a unit vector. As the last estimate, 



n 



\Gj-Aji 



- i 



n \ n 



(27) 



which is easily obtained from — ||Aj|| < \\Gj — Aj\\ < \\Gj\\ + ||Aj|| and 
in the following way: 



n 



\G 3 -A,\ 



< 



\G 3 \ 



1 < 



1 



n 



toll -||A,| 



1 < 



1 . 



:28i 



(29) 



bmce, using since 

1 



1 — x 



< 1 + 2x 



(30) 



for sufficiently small x > 0, we obtain that, for sufficiently large n 

1 



7? 



|G,-A,| 



1 < 



1 - yjlfiti - yJC'f'/n 




2 C" 

1<2 ^Wv- (31) 



Together with an (even narrower) lower bound obtained by similar argu- 
ments, this yields ( 1271) . 

From these four estimates, the first induction claim (TIB"]) follows for j < k 
because, for sufficiently large n, 



\VnUij - Gi 



n 



-(Gij — Ay) — Gi 



< 



\G j -A j 



\Gij \ + 



< 



n 



l^-A,! 



I Ay | < 



(32) 
(33) 



where we have used (ETJ), C§, (E7J) with Cj/y/n < 1, and (El). The second 
induction claim ffTTl) follows from 



a- 



\Gj-A, 



\G S \ 



n 



|G,-A,| 



|Ai|| < 



n 



< -^=2^ + 2JC'" = 2Cj + 2JC" =: Cj 



(35) 
(36) 



This completes the proof. 



We close with a remark on the parenthesis in Theorem 1: "the upper 
left (or, in fact, any) k x k submatrix." We elucidate the meaning of "any." 
To select a k x k submatrix means to select k rows and k columns. This 
selection must be deterministic (i.e., non-random, or at least independent 
of the Uij) but may depend on n. Indeed, if the selection depended on 
the Uij, one could, for example, select those rows and columns where Z7y 
happens to be exceptionally close to zero, which would lead to a different 
asymptotic distribution. On the other hand, for a selection depending on n, 
Theorem 1 remains true: to see this, recall that for a compact group such 
as U(n), the Haar measure is both left-invariant and right-invariant; as a 
consequence, Haar(£7(n)) is invariant under any (non-random) permutation 
of either the rows or the columns, and thus all k x k submatrices have the 
same distribution. 
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